ECG beats classification using waveform similarity 

and RR interval 
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Abstract — This paper present an electrocardiogram (ECG) 
beat classification mettiod based on waveform similarity and RR 
interval. The purpose of the method is to classify six types of 
heart beats (normal beat, atrial premature beat, paced beat, 
premature ventricular beat, left bundle branch block beat and 
right bundle branch block beat). The electrocardiogram signal is 
first denoised using wavelet transform based techniques. Heart 
beats of 128 samples data centered on the R peak are extracted 
from the ECG signal and thence reduced to 16 samples data to 
constitute a feature. RR intervals surrounding the beat are also 
exploited as feature. A database of annotated beats is built for 
the classifier for waveform comparison to unknown beats. Tested 
on 46 records in the MIT/BIH arrhythmia database, the method 
shows classification rate of 97.52%. 

Index Terms — ECG beat classification, RR interval, wavelet 
transform, patient adaptation. 



I. Introduction 

WITH the introduction of the string galvanometer by 
Willem Einthoven the electrocardiogram (ECG) has 
become one of the most important tools in the diagnosis of 
heart deseases. The ECG is the graphical display of electrical 
activity of the heart recorded from electrodes on the body 
surface yj. From the plot of an ECG a cardiologist can 
analyse the shape of the waveform and determine the nature 
of deseases afficting the heart. The abnormal beats in the ECG 
pointing to a particuliar desease can be rare and widespread 
in the span of a large record. Therefore, the work of the 
cardiologist tracking down abnormalities can be tedious. Thus 
it becomes very helpful to use computer-based diagnosis. 

Besides the fact the ECG record can be noisy, the main 
problem in computer based classification is the wide variety 
in the shape of beats belonging to the same class and beats of 
similar shape belonging to differents classes |l2l, [31. Therefore 
the algorithms in computer-based diagnosis are generaly of 
three steps: EGC beat detection, extraction of usefull features 
from beats and classification. For beat detection a number 
of methods are available in the literature f?], [31. Feature 
extraction can be done in time domain |i6J, in frequency 
domain Q, by multiscale decomposition fS), by multifractal 
analysis |9| or by statistical means IS). The classification 
can be performed by neural networks lITOl . ifTTl . mixture of 
experts [12], switchable scheme llT3l . 

Although statistical methods of ECG beats recognition [|2l, 
ifTSl . lfT4ll . reports good recognition accuracy, we believe as 
in the work of Yu Hen Hu lfT2l that a good approach in 
ECG beat classification is to take into account specificities 
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of each patient's electrocardiogram. Therefore the method 
proposed in this paper is a patient-specific classifier. Wavalet 
transform has been successfully used in the processing of non 
stationary signals like electrocardiograms lITSl . Therefore it is 
employed in this study for ECG signal denoising and beat 
length reduction. A beats database containing five classes of 
annotated beats is created for the classifier. And each time a 
patient's ECG beats have to be classified the five first minutes 
manually annotated beats of the ECG record is integrated 
into that beats database. So the classifier experience grows 
up each time an ECG record is submited to it for automatic 
annotation. The beats of the patient's ECG are first clustered 
by similarity in the shape of their waveform and then each 
cluster is classified by considering the greater similarity of its 
elements to the beats in the classifier beats database. 



II. ECG SIGNAL PRE-PROCESSING 

The purpose of this work is to classify six types of heart 
beats which are : Normal beats (N), Premature Ventricular 
Contractions (PVC), Paced Beats (PB), Atrial Premature Beat 
(APB), Left Bundle Branch Block beats (LBBB) and Right 
Bundle Branch Block beats(RBBB). A description of these 
arrhythmias can be found at [! [. Forty six (46) ECG signals 
recorded with the Mason - Likar II lead (MLII) are taken from 
the MIT/BIH arrhythmia database for the creation of the beats 
database and the evaluation of the classifier. 

The ECG records are generaly noisy, they present a base- 
line wander and high frequency noise. Techniques based 
on discrete wavelet transform are used to overcome these 
disturbances. 



A. Discrete Wavelet Transform 

The Discrete Wavelet Transform is mainly based on the 
multiresolution analysis of the wavelet transform introduced 
by S. Mallat [16[. With the discrete wavelet transform any 
function / G L2(K) can be uniquely represented in terms of 
an L2-convergent series: 

CO 

x{t) = ^ ajofc^jofe(*) + 51 XI PjkiJjkit), (1) 

k j=jo k 

where {4>jok} is an orthonormal system from the scaling 
function and {i^jk} an orthonormal system from the mother 
wavelet, and 



jak = / x{t)4>j„k{t)dt, l3jk = / x{t)lpjk{t)dt 
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are the wavelet coefficients. The cascade algorithm IfT^ allows 
the computation of the lower level coefficients from the higher 
level ones and vice versa: 

ajk = ^ hi-2kaj+i^i and fijk = ^ Xi-2kaj+ij (2) 

where = (— j and k are integers and {hf.} 
are the mother wavelet coefficients. The index j indicates the 
resolution level of the multiresolution analysis. The inverse 
transformation is given by the following equation: 
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(3) 



B. ECG signal noise reduction techniques 

ECG records are generaly corrupted by noise from different 
sources. The noise can appear as a baseline wander and/or a 
high frequency oscillation along the signal. 

In this work, for baseline wander cancellation we have used 
the method proposed in 1 17 1 for its ability to eliminate baseline 
drift without including distortion in the signal. On the other 
hand we have delt with the high oscillation noise by using the 
soft thresholding technique proposed by D. Donoho |,18J . 

We have made a Java implementation of these two noise 
reduction techniques and run the program on a noisy ECG 
signal. The result is shown in figure [T] 
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Fig. 1. Noise cancellation result on record number 113, MLII, from MIT- 
Arrhythmia Database: (a) Original signal (b) Denoised signal. 



RR interval will be very small compared to the mean RR 
of none premature beats in the ECG signal. The distinction 
between PVC and APB is that usually a PVC is followed 
by a compensatory pause i.e., the RR interval between two 
QRS enclosing PVC equals twice the normal RR interval; in 
contrast, APBs are usually followed by no compensatory pause 
i.e., the RR interval between two QRS enclosing APB is less 
than twice the normal RR interval [l]. 

Finally RRb, RRa (respectively the RR interval before and 
after the R peak) along with RRm the mean RR of no 
premature beats are considered as features in this study. 

The R peaks are taken from the annotation files of the MIT- 
BIH arrhythmia database. From the ECG record, beats are 
extracted using 128 samples centered at the R peak. After 
that, the beat size is reduced to 16 samples by discrete wavelet 
transform as follows: with the multiresolution approch the 128 
samples beats at resolution level jo is denoted by 



2«'-l 



bi2?.{t) = ^ aj„k(l)jok{t), Pjok = 0, 



(4) 



fe=0 



its decomposition to a resolution level jo — 3 is 



2^0 "^-1 jo 2^-1 

k=0 j=3o-3 k=0 

(5) 

(j) and i/) are scale functions and wavelet functions at the 
corresponding resolution level. 

The beats at resolution level jo can be characterized by 
2^0 — 128 samples per length unit, then jo — 7, thus the last 
equation becomes 

■2^-1 7 2^-1 

bl28{t) = ^ a4^k4'4,k{t) + X! X! /^jk/tpjkit) (6) 
fe=0 j=4 k=0 

By setting all the detail coefficients /3jk to zero we consider 
the reduced beat 
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b{t) = ^ a4,k<l>4,k{t) 



(7) 
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as a feature in our method. The reduction of the beat size from 
128 to 16 samples saves hardware memory and accelerates the 
processing speed. 



III. Proposed method 

A. Feature extraction 

It is known that the morphology of the QRS complex 
along with the instantanuous RR interval (interval between 
two successives R peaks) play an important role in the heart 
deseases diagnosis lfT3l . lfT4l . For the classification of an ECG 
beat the ratio of the RR interval before it to the one after 
it is a usefull feature. For beats such as N, LBBB, RBBB 
and PB this ratio is near or equal to 1 and for beats in the 
classe of APB and PVC this ratio is rather less than 1. But 
sometimes it happens that PVCs or APBs occur in groups jT], 
in such case the ratio can be near or equal to 1 but the 



B. Classification 

Let A be the orthonormal family of scaling functions {04, /c} 
at resolution level 2^. From equation (|7| we see that A 
generates a vector space B C L2 (K) containing all the beats in 
our study. Therefore the scalar product in L2(K) can also be 
defined for every two vectors of B : |^) — X^^eA ^<i>\^) ^'^'^ 
m = E^eA%\^) by {^m = E^eA^^^' ^^ere 
denote the complex conjugate of (it coincides with 
if it is real). We can define a similarity function on B as 
follow: 

.:BxB^[0,l], -i^^^') = jM^, (8) 
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II ■ II denotes a Hilbert norm on B. Applied on two beats the 
similarity function indicates the degree of proximity in their 
shape. Such similarity functions have been used previously in 
Uterature in many different contexts, see ||T9l for a recent up 
to date review. 

As stated in 1201 we believe that computer analysis cannot 
substitute physician's interpretation of ECG. Therefore our 
classifier use a database of known beats (reduced length 
version of the original 128 samples beats) taken within the 
five first minutes of each record in the MIT/BIH arrhythmia 
database. The classifier beat database contains five classes of 
beats which are N, LBBB, RBBB, PB and PVC. The APB 
beat type is not present in that database because APBs can 
be similar to N, LBBB and RBBB, therefore its identification 
is mainly based on the ratio of the previous RR interval to 
the following one (RRt,/ RRa). The beats database acts as 
the classifier knowledge. It grows up each time a patient's 
ECG record is treated because the manually annotated beats 
in the five first minutes are included in it. This practice is 
conformed to the AAMI (American Association of Medical 
Instrumentation) recommended procedure which allows the 
use of the first 5 minutes of data in an ECG record to fine 
tune the classifier [12] ■ 

After the denoising step, the classification of a patient's 
ECG record is done with the following steps: 

(1) Beats are extracted using 128 samples data centered on 
the R peaks. For each beat the RR interval before (RRb) and 
after (RRa) its R peak are taken. Afterwards beat length are 
reduced to 16 samples using discrete wavelet decomposition as 
described in the feature extraction section. The resulted beat 
is normalized to reduce waveform differences in the same 
class by subtracting the mean value and then dividing by the 
standard deviation [ 1 3 1 . 

(2) The beats within the 5 first minutes of the ECG are 
included in the classifier beat database. And the mean RR 
(RRm) of beats in those 5 minutes is taken as feature. 

(3) The beats in the 25 minutes remaining are first hierar- 
chical clustered by similatity before classification. 

(4) The class of a beat is identified using its highest 
similarity to beats in the classifier database. 

(5) If the class is found to be N or LBBB or RBBB, the 
ratio RRb /RRa is calculated. 

If RRb/ RRa < (1 - ei) ou {RRa + RRt) < {2RRm - 
£2) then the class is changed to APB otherwise the class is 
unchanged. The optimal ei and €2 value will be identified with 
experiments. 

(6) The class of a cluster is the class of its element which has 
the highest similarity to a beat in the classifier ECG database. 

IV. Results and discussion 

The classifier was tested with the 25 minutes of data from 
each 30 minutes ECG signals recorded with the Mason - Likar 
II lead (MLII) in the MIT/BIH arrhythmia database. The 5 
first minutes of those ECG signals were used to build the beats 
database used by the classifier. In each ECG record used in this 
study only beats in the class of normal beat (N), left bundle 
branch block beat (LBBB), right bundle branch block beat 



(RBBB), artial premature beat (APB), premature ventricular 
contraction (PVC) and paced beat (PB) are considered. The 
RR intervals are calculated from the position of the R peak 
documented in the annotation files of the MIT/BIH database. 
The mean RR (i?i?,„) used in the identification of atrial 
premature beats (APB) is calculated using the 5 first minutes 
of the signal. For the clustering of beats before classification, 
we consider that two beats belong to the same cluster if their 
similarity is greater or equals to 0.95. 



TABLE I 
Classification results 
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The performance of the classifier is evaluated in terms 
of accuracy rate by ECG signal and overall accuracy. The 
classification results are summarized in Table 1. 

The overall classification result is found to be 97.52%. This 
is a good recognition accuracy in regard to the fact exposed 
in II2TI that the percentage of ECGs correctly classified by the 
computer programs have a median of 91.3%. 
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In some records (203, 222 and 223) the classification rate is 
rather low. An examination of these records indicates a high 
variation of length of the RR intervals around normal beats. 
This variations causes normal beats to be classified as APBs. 
Another fact to take in consideration is that misclassification 
can result from errors in the position of R peaks provided by 
the manually annotated files in the MIT-BIH database. Our 
similarity function can give bad results for two beats in the 
same class when their R peak are misaligned at the top of 
their QRS complex. We are currently developping a promising 
method for the detection of the summit of R peaks for better 
alignment when comparing two beats. 

A little difficulty in the application of our method can be 
the creation of the beats database used by the classifier We 
think, it is possible to find a way to reduce the 5 minutes 
a cardiologist have to annotate. But we stay convinced as 
in f2Tl that Computer-based interpretation of the ECG is an 
adjunct to the electrocardiographer, and all computer-based 
reports require physician overreading. 

A comparison of our method to other works in the field of 
automatic ECG beat classification is summerized in table 2. 
But actually, efficency comparison of methods is not straigh- 
forward due to the difference in the test conditions (type of 
beats to classify and number of beats used for test). 

TABLE II 

Comparative results of different ECG beat classification 

METHODS 



Method 


Number of beat type 


Accuracy(%) 


ICA 1131 


6 


99.51 


FTNN 1 22 1 


3 


98 


MOE |12| 


4 


94 


MRANN 1 8 1 


13 


96.79 


FHNN |2| 


7 


96.6 


Our method 


6 


97.52 



It is interesting to note that some methods published in the 
literature are not tested on a large number of beats as we 
do. For exemple in the Independent Component Analysis llT3l 
method, authors have used per ECG signal 100 beats for 
training and 100 beats for testing even if the signal contains 
more than 2000 beats. Since the morphology of beats of the 
same type not only changes from patient to patient but also 
within the same patient |3 1, the number of beats for testing can 
impact on the recognition accuracy. In the table 1 if we get 
rid of the records 203, 222 and 223 which give bad results the 
overall classification rate of our method increases from 97.52 
to 98.53%. 

V. Conclusion 

In this paper, we present a patient-adaptable ECG beat 
classification method based on a similarity function and a 
beats database which acts as the classifier knowlegde. Dis- 
crete wavelet transform is also used for the ECG signal 
preprocessing. The method uses a simple approach and runs 
with low processing cost in comparison with those using 
neural networks or fuzzy logic. A promising accuracy in the 
classification of six types of heart beats has been reached. 
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